[CRIU] [PATCH 1/2] mount: handle tracefs more gracefully

Tycho Andersen tycho.andersen at canonical.com
Tue May 17 14:04:23 PDT 2016


On Tue, May 17, 2016 at 10:37:22AM -0700, Andrew Vagin wrote:
> Hi Tycho,
> 
> Honestly, I don't like this hack. I think it will not work in all cases.
> 
> For example, we create a clean mount, if an required path is
> overmounted:
> 
>         if (&c->siblings != &mi->bind->children) {
>                 /* Get a copy of mi->bind without child mounts */
>                 if (mount(mnt_path, mnt_clean_path, NULL, MS_BIND, NULL)) {
>                         pr_perror("Unable to bind-mount %s to %s",
>                                         mi->bind->mountpoint, mnt_clean_path);
>                 }
>                 mnt_path = mnt_clean_path;
>                 umount_mnt_path = true;
>         }
> 
> so if we will create a clean mount for debugfs, we will lost tracefs.

The problem is that for unprivileged containers, the kernel does some
security checks to make sure that stuff that's overmounted isn't
exposed by a new mount, so you'll get an EPERM by trying to bind
/sys/kernel/debug, because that will expose /sys/kernel/debug/tracing.

> Do you bind-mount debugfs from host in a container? If it's yes, you probably
> use --mnt-ext-map options for it. Maybe we can add recursive external mounts
> or something like this.

I don't think being explicitly doing --ext-mnt-map will help, as it'll
still try to mount things in the wrong order. We could add an
--ext-mount-map-rec or something.

Actually I think we can drop the tracefs hunk in proc-parse.c if you
want, so long as we have a way to ignore some mounts. The problem here
really is that unprivileged users can't see into /sys/kernel/debug, so
they get EPERM when binding /sys/kernel/debug/tracing.

I guess this brings up a more general question of: what do we do when
the source of a bind mount isn't visible to unprivileged users?

Tycho

> On Mon, May 16, 2016 at 10:30:47PM +0000, Tycho Andersen wrote:
> > See the comment for details, but basically tracefs is automounted by the
> > kernel, so we can just mount debugfs with MS_REC and get the right result.
> > 
> > v2: rebase on criu-dev
> > v3: don't use a new fstype->flags, just always set MS_REC in debugfs'
> >     ->parse
> > 
> > Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> > ---
> > This is kind of similarly ugly to to a cgroup patch I sent earlier, where
> > we needed some filesystem specific code and checks in various parts of
> > mount.c. I somehow can't figure out a nicer way to do this, but I am
> > definitely open to suggestions.
> > ---
> >  criu/mount.c      | 15 +++++++++++++++
> >  criu/proc_parse.c |  7 +++++++
> >  images/mnt.proto  |  1 +
> >  3 files changed, 23 insertions(+)
> > 
> > diff --git a/criu/mount.c b/criu/mount.c
> > index 6f47655..e85e5d9 100644
> > --- a/criu/mount.c
> > +++ b/criu/mount.c
> > @@ -1679,6 +1679,17 @@ out:
> >  	return ret;
> >  }
> >  
> > +static int debugfs_parse(struct mount_info *pm)
> > +{
> > +	/* tracefs is automounted underneath debugfs sometimes, and the
> > +	 * kernel's overmounting protection prevents us from mounting debugfs
> > +	 * first without tracefs, so let's always mount debugfs MS_REC.
> > +	 */
> > +	pm->flags |= MS_REC;
> > +
> > +	return 0;
> > +}
> > +
> >  static int cgroup_parse(struct mount_info *pm)
> >  {
> >  	if (!(root_ns_mask & CLONE_NEWCGROUP))
> > @@ -1780,6 +1791,10 @@ static struct fstype fstypes[] = {
> >  	}, {
> >  		.name = "debugfs",
> >  		.code = FSTYPE__DEBUGFS,
> > +		.parse = debugfs_parse,
> > +	}, {
> > +		.name = "tracefs",
> > +		.code = FSTYPE__TRACEFS,
> >  	}, {
> >  		.name = "cgroup",
> >  		.code = FSTYPE__CGROUP,
> > diff --git a/criu/proc_parse.c b/criu/proc_parse.c
> > index cebf21c..db1d0af 100644
> > --- a/criu/proc_parse.c
> > +++ b/criu/proc_parse.c
> > @@ -1381,6 +1381,13 @@ struct mount_info *parse_mountinfo(pid_t pid, struct ns_id *nsid, bool for_dump)
> >  			goto end;
> >  		}
> >  
> > +		if (new->fstype->code == FSTYPE__TRACEFS) {
> > +			pr_info("\tskipping tracefs mounted at %s\n", new->mountpoint + 1);
> > +			mnt_entry_free(new);
> > +			new = NULL;
> > +			goto end;
> > +		}
> > +
> >  		pr_info("\ttype %s source %s mnt_id %d s_dev %#x %s @ %s flags %#x options %s\n",
> >  				fsname, new->source,
> >  				new->mnt_id, new->s_dev, new->root, new->mountpoint,
> > diff --git a/images/mnt.proto b/images/mnt.proto
> > index fe1e4bb..9338ecb 100644
> > --- a/images/mnt.proto
> > +++ b/images/mnt.proto
> > @@ -20,6 +20,7 @@ enum fstype {
> >  	AUTO			= 16;
> >  	OVERLAYFS		= 17;
> >  	AUTOFS			= 18;
> > +	TRACEFS			= 19;
> >  };
> >  
> >  message mnt_entry {
> > -- 
> > 2.7.4
> > 
> > _______________________________________________
> > CRIU mailing list
> > CRIU at openvz.org
> > https://lists.openvz.org/mailman/listinfo/criu


More information about the CRIU mailing list