[Devel] [PATCH RHEL7 COMMIT] ms/fs: Add user namespace member to struct super_block

Konstantin Khorenko khorenko at virtuozzo.com
Tue Jul 11 18:39:36 MSK 2017


The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.33.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.33.3
------>
commit 1352424aae378f285da1b36719e86b5069c693c2
Author: Eric W. Biederman <ebiederm at xmission.com>
Date:   Tue Jul 11 19:39:36 2017 +0400

    ms/fs: Add user namespace member to struct super_block
    
    Patchset description:
    fs: translate uids/gids against current user namespace's mapping
    
    We want to configure non-default user namespace mappings for Containers,
    but still want to store uids/gids of files relative to Container user ns mapping.
    
    The solution is to store link to user_ns in super block on fs mount
    and use that user_ns mapping for later inodes' uid/gid mapping.
    
    Notes:
    1) acl should also behave in the same way, not tested yet
    2) mainstream has disabled quota for non-init user_ns:
       5c00482 ("dquot: For now explicitly don't support filesystems outside of
       init_user_ns")
       We need quota working inside a Container, so i did not apply the patch,
       but quota code has be to reviewed additionally.
    
    Eric W. Biederman (5):
      ms/fs: Add user namespace member to struct super_block
      ms/vfs: Verify acls are valid within superblock's s_user_ns.
      ms/vfs: Don't modify inodes with a uid or gid unknown to the vfs
      ms/vfs: Don't create inodes with a uid or gid unknown to the vfs
      ms/quota: Ensure qids map to the filesystem
    
    Konstantin Khorenko (1):
      proc: use proper user_ns for mount
    
    Seth Forshee (5):
      ms/fs: Refuse uid/gid changes which don't map into s_user_ns
      ms/fs: Check for invalid i_uid in may_follow_link()
      ms/cred: Reject inodes with invalid ids in set_create_file_as()
      ms/fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
      ms/vfs: open() with O_CREAT should not create inodes with unknown ids
    
    https://jira.sw.ru/browse/PSBM-40075
    
    ===============================================================
    This patch description:
    
    Start marking filesystems with a user namespace owner, s_user_ns.  In
    this change this is only used for permission checks of who may mount a
    filesystem.  Ultimately s_user_ns will be used for translating ids and
    checking capabilities for filesystems mounted from user namespaces.
    
    The default policy for setting s_user_ns is implemented in sget(),
    which arranges for s_user_ns to be set to current_user_ns() and to
    ensure that the mounter of the filesystem has CAP_SYS_ADMIN in that
    user_ns.
    
    The guts of sget are split out into another function sget_userns().
    The function sget_userns calls alloc_super with the specified user
    namespace or it verifies the existing superblock that was found
    has the expected user namespace, and fails with EBUSY when it is not.
    This failing prevents users with the wrong privileges mounting a
    filesystem.
    
    The reason for the split of sget_userns from sget is that in some
    cases such as mount_ns and kernfs_mount_ns a different policy for
    permission checking of mounts and setting s_user_ns is necessary, and
    the existence of sget_userns() allows those policies to be
    implemented.
    
    The helper mount_ns is expected to be used for filesystems such as
    proc and mqueuefs which present per namespace information.  The
    function mount_ns is modified to call sget_userns instead of sget to
    ensure the user namespace owner of the namespace whose information is
    presented by the filesystem is used on the superblock.
    
    For sysfs and cgroup the appropriate permission checks are already in
    place, and kernfs_mount_ns is modified to call sget_userns so that
    the init_user_ns is the only user namespace used.
    
    For the cgroup filesystem cgroup namespace mounts are bind mounts of a
    subset of the full cgroup filesystem and as such s_user_ns must be the
    same for all of them as there is only a single superblock.
    
    Mounts of sysfs that vary based on the network namespace could in principle
    change s_user_ns but it keeps the analysis and implementation of kernfs
    simpler if that is not supported, and at present there appear to be no
    benefits from supporting a different s_user_ns on any sysfs mount.
    
    Getting the details of setting s_user_ns correct has been
    a long process.  Thanks to Pavel Tikhorirorv who spotted a leak
    in sget_userns.  Thanks to Seth Forshee who has kept the work alive.
    
    Thanks-to: Seth Forshee <seth.forshee at canonical.com>
    Thanks-to: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    Acked-by: Seth Forshee <seth.forshee at canonical.com>
    Signed-off-by: Eric W. Biederman <ebiederm at xmission.com>
    
    https://jira.sw.ru/browse/PSBM-40075
    
    (cherry picked from commit 6e4eab577a0cae15b3da9b888cff16fe57981b3e)
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    Conflicts:
    	fs/kernfs/mount.c
    	fs/super.c
    
    Changes:
    - no kernfs in vz7 kernel => dropped hunk
    - user_ns is not provided into mount_ns => use current_user_ns() there
      ("d91ee87 vfs: Pass data, ns, and ns->userns to mount_ns")
---
 fs/super.c         | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/fs.h | 12 ++++++++++++
 2 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 7470621..3e067e1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -35,6 +35,7 @@
 #include <linux/fsnotify.h>
 #include <linux/lockdep.h>
 #include <linux/memcontrol.h>
+#include <linux/user_namespace.h>
 #include "internal.h"
 
 const unsigned super_block_wrapper_version = 0;
@@ -171,6 +172,7 @@ static void destroy_super(struct super_block *s)
 		percpu_counter_destroy(&s->s_writers.counter[i]);
 	security_sb_free(s);
 	WARN_ON(!list_empty(&s->s_mounts));
+	put_user_ns(s->s_user_ns);
 	kfree(s->s_subtype);
 	kfree(s->s_options);
 	kfree(s);
@@ -180,11 +182,13 @@ static void destroy_super(struct super_block *s)
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
  *	@flags: the mount flags
+ *	@user_ns: User namespace for the super_block
  *
  *	Allocates and initializes a new &struct super_block.  alloc_super()
  *	returns a pointer new superblock or %NULL if allocation had failed.
  */
-static struct super_block *alloc_super(struct file_system_type *type, int flags)
+static struct super_block *alloc_super(struct file_system_type *type, int flags,
+				       struct user_namespace *user_ns)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block_wrapper),  GFP_USER);
 	static const struct super_operations default_op;
@@ -194,6 +198,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 		return NULL;
 
 	INIT_LIST_HEAD(&s->s_mounts);
+	s->s_user_ns = get_user_ns(user_ns);
 
 	if (security_sb_alloc(s))
 		goto fail;
@@ -454,17 +459,18 @@ void generic_shutdown_super(struct super_block *sb)
 EXPORT_SYMBOL(generic_shutdown_super);
 
 /**
- *	sget	-	find or create a superblock
+ *	sget_userns -	find or create a superblock
  *	@type:	filesystem type superblock should belong to
  *	@test:	comparison callback
  *	@set:	setup callback
  *	@flags:	mount flags
+ *	@user_ns: User namespace for the super_block
  *	@data:	argument to each of them
  */
-struct super_block *sget(struct file_system_type *type,
+struct super_block *sget_userns(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-			int flags,
+			int flags, struct user_namespace *user_ns,
 			void *data)
 {
 	struct super_block *s = NULL;
@@ -477,6 +483,14 @@ struct super_block *sget(struct file_system_type *type,
 		hlist_for_each_entry(old, &type->fs_supers, s_instances) {
 			if (!test(old, data))
 				continue;
+			if (user_ns != old->s_user_ns) {
+				spin_unlock(&sb_lock);
+				if (s) {
+					up_write(&s->s_umount);
+					destroy_super(s);
+				}
+				return ERR_PTR(-EBUSY);
+			}
 			if (!grab_super(old))
 				goto retry;
 			if (s) {
@@ -489,7 +503,7 @@ struct super_block *sget(struct file_system_type *type,
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
-		s = alloc_super(type, flags);
+		s = alloc_super(type, flags, user_ns);
 		if (!s)
 			return ERR_PTR(-ENOMEM);
 		goto retry;
@@ -512,6 +526,31 @@ struct super_block *sget(struct file_system_type *type,
 	return s;
 }
 
+EXPORT_SYMBOL(sget_userns);
+
+/**
+ *	sget	-	find or create a superblock
+ *	@type:	  filesystem type superblock should belong to
+ *	@test:	  comparison callback
+ *	@set:	  setup callback
+ *	@flags:	  mount flags
+ *	@data:	  argument to each of them
+ */
+struct super_block *sget(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags,
+			void *data)
+{
+	struct user_namespace *user_ns = current_user_ns();
+
+	/* Ensure the requestor has permissions over the target filesystem */
+	if (!(flags & MS_KERNMOUNT) && !ns_capable(user_ns, CAP_SYS_ADMIN))
+		return ERR_PTR(-EPERM);
+
+	return sget_userns(type, test, set, flags, user_ns, data);
+}
+
 EXPORT_SYMBOL(sget);
 
 void drop_super(struct super_block *sb)
@@ -925,7 +964,8 @@ struct dentry *mount_ns(struct file_system_type *fs_type, int flags,
 {
 	struct super_block *sb;
 
-	sb = sget(fs_type, ns_test_super, ns_set_super, flags, data);
+	sb = sget_userns(fs_type, ns_test_super, ns_set_super, flags,
+			 current_user_ns(), data);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 06892d6..6b509f8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1572,6 +1572,13 @@ struct super_block {
 	RH_KABI_EXTEND(struct workqueue_struct *s_dio_done_wq)
 
 	/*
+	 * Owning user namespace and default context in which to
+	 * interpret filesystem uids, gids, quotas, device nodes,
+	 * xattrs and security labels.
+	 */
+	struct user_namespace *s_user_ns;
+
+	/*
 	 * Keep the lru lists last in the structure so they always sit on their
 	 * own individual cachelines.
 	 */
@@ -2278,6 +2285,11 @@ void put_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int get_anon_bdev(dev_t *);
 void free_anon_bdev(dev_t);
+struct super_block *sget_userns(struct file_system_type *type,
+			int (*test)(struct super_block *,void *),
+			int (*set)(struct super_block *,void *),
+			int flags, struct user_namespace *user_ns,
+			void *data);
 struct super_block *sget(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),


More information about the Devel mailing list