[CRIU] [PATCH v3 2/2] restore: correctly restore cgroup mounts inside a container

Tycho Andersen tycho.andersen at canonical.com
Mon Mar 28 07:27:36 PDT 2016


On Mon, Mar 28, 2016 at 01:42:55PM +0300, Pavel Emelyanov wrote:
> On 03/25/2016 10:26 PM, Tycho Andersen wrote:
> > Before the nsroot= mount option, we were just getting lucky because the
> > cgroup superblocks "matched" when inspecting them from userspace, so we
> > were actually getting a bind mount from the host when migrating from within
> > cgroup namespaces.
> > 
> > Instead, let's actually do a new (i.e. not a bind mount) for cgroup
> > namespaces. For this, we need two things:
> > 
> > 1. to prepare the cgroup namespace (and thus the cgroups) before the mount
> >    ns, so when the mount() occurrs it is relative to the right cgroup path.
> > 
> > 2. not reject cgroup filesystems with no root. A cgroup ns mount looks
> >    like:
> > 
> > 	 223 222 0:22 /lxc/unpriv /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/lxc/unpriv
> > 
> >    i.e. it has /lxc/unpriv as its root, and thus doesn't look rooted to CRIU.
> >    We introduce the fstype->munge hook to rewrite this root to /, since it
> >    is handled by the cgroup ns infrastructure.
> > 
> > v2: add new fstype->munge hook, allowing fstypes to munge their parsed
> >     mountinfo entries if they want to. this allows us to get rid of the
> >     ugly hacks with FSTYPE__CGROUP everywhere in teh patch.
> > 
> > Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> > ---
> >  criu/cr-restore.c         | 18 +++++++++---------
> >  criu/include/proc_parse.h |  1 +
> >  criu/mount.c              | 14 ++++++++++++++
> >  criu/proc_parse.c         |  3 +++
> >  4 files changed, 27 insertions(+), 9 deletions(-)
> > 
> > diff --git a/criu/cr-restore.c b/criu/cr-restore.c
> > index ffd2f01..967d5fa 100644
> > --- a/criu/cr-restore.c
> > +++ b/criu/cr-restore.c
> > @@ -1608,6 +1608,15 @@ static int restore_task_with_children(void *_arg)
> >  			goto err;
> >  	}
> >  
> > +	/*
> > +	 * Call this _before_ forking to optimize cgroups
> > +	 * restore -- if all tasks live in one set of cgroups
> > +	 * we will only move the root one there, others will
> > +	 * just have it inherited.
> > +	 */
> > +	if (prepare_task_cgroup(current) < 0)
> > +		goto err;
> > +
> >  	/* Restore root task */
> >  	if (current->parent == NULL) {
> >  		if (restore_finish_stage(CR_STATE_RESTORE_NS) < 0)
> > @@ -1640,15 +1649,6 @@ static int restore_task_with_children(void *_arg)
> >  	if (prepare_mappings())
> >  		goto err;
> >  
> > -	/*
> > -	 * Call this _before_ forking to optimize cgroups
> > -	 * restore -- if all tasks live in one set of cgroups
> > -	 * we will only move the root one there, others will
> > -	 * just have it inherited.
> > -	 */
> > -	if (prepare_task_cgroup(current) < 0)
> > -		goto err;
> > -
> >  	if (prepare_sigactions() < 0)
> >  		goto err;
> >  
> > diff --git a/criu/include/proc_parse.h b/criu/include/proc_parse.h
> > index 5de5c86..b9b44aa 100644
> > --- a/criu/include/proc_parse.h
> > +++ b/criu/include/proc_parse.h
> > @@ -114,6 +114,7 @@ struct fstype {
> >  	int (*dump)(struct mount_info *pm);
> >  	int (*restore)(struct mount_info *pm);
> >  	int (*parse)(struct mount_info *pm);
> > +	int (*munge)(struct mount_info *pm);
> >  	mount_fn_t mount;
> >  };
> >  
> > diff --git a/criu/mount.c b/criu/mount.c
> > index eb8d058..427ccc1 100644
> > --- a/criu/mount.c
> > +++ b/criu/mount.c
> > @@ -1631,6 +1631,19 @@ out:
> >  	return ret;
> >  }
> >  
> > +static int cgroup_munge(struct mount_info *pm)
> > +{
> > +	if (!(root_ns_mask & CLONE_NEWCGROUP))
> > +		return 0;
> > +
> > +	xfree(pm->root);
> > +	pm->root = xstrdup("/");
> > +	if (!pm->root)
> > +		return -1;
> > +
> > +	return 0;
> > +}
> > +
> >  static int dump_empty_fs(struct mount_info *pm)
> >  {
> >  	int fd, ret = -1;
> > @@ -1716,6 +1729,7 @@ static struct fstype fstypes[32] = {
> >  	}, {
> >  		.name = "cgroup",
> >  		.code = FSTYPE__CGROUP,
> > +		.munge = cgroup_munge,
> >  	}, {
> >  		.name = "aufs",
> >  		.code = FSTYPE__AUFS,
> > diff --git a/criu/proc_parse.c b/criu/proc_parse.c
> > index 24a9154..d39bd96 100644
> > --- a/criu/proc_parse.c
> > +++ b/criu/proc_parse.c
> > @@ -1295,6 +1295,9 @@ static int parse_mountinfo_ent(char *str, struct mount_info *new, char **fsname)
> >  	if (parse_sb_opt(opt, &new->sb_flags, new->options))
> >  		goto err;
> >  
> > +	if (new->fstype->munge && new->fstype->munge(new) < 0)
> > +		goto err;
> > +
> 
> Hm... Few lines below there's already a call to ->parse() callback :)

Sure, I can switch it to just use the parse callback. I'll resend.

Tycho

> >  	ret = 0;
> >  ret:
> >  	xfree(opt);
> > 
> 


More information about the CRIU mailing list