[Devel] [PATCH rh7 v2] ve/devpts: Support per-VE mount namespace

Wed Jul 22 02:23:43 PDT 2015

On Wed, Jul 22, 2015 at 11:55:30AM +0300, Vladimir Davydov wrote:
> > +
> > +#ifdef CONFIG_VE
> >  #include <linux/ve.h>
> > +#endif
> 
> Why?

There seem to be not many openvz related stuff, so I wrapped everything,
including header with config, this will make compilation a bit faster.

> >  
> > +#ifndef CONFIG_VE
> >  	/* Require newinstance for all user namespace mounts to ensure
> >  	 * the mount options are not changed.
> >  	 */
> > -	if (!IS_ENABLED(CONFIG_VE) &&
> > -	    (current_user_ns() != &init_user_ns) && !opts.newinstance)
> > +	if ((current_user_ns() != &init_user_ns) && !opts.newinstance)
> 
> FWIW, once we switch to userns, which is going to be pretty soon I hope,
> we have to tweak this again.

Yes, but at moment this hunk reverts code back to former. I believe until
we don't have a strong reason better keep code unmodified (in turn
CONFIG_VE marks shows the modifications brought in by us).

> >  		return ERR_PTR(-EINVAL);
> > +#endif
> >  
> > +#ifdef CONFIG_VE
> > +	/*
> > +	 * Each container has to have own devpts superblock for isolation
> > +	 * sake but it makes a bad joke for us: in CRIU we test if devpts
> > +	 * device in container is the same as on the node, to figure out
> > +	 * if @newinstance option has to be passed (simply because in
> > +	 * vanilla kernel there is no such devpts virtualization) on
> > +	 * the restore. Thus every time we're restoring container
> > +	 * we pass @newinstance option even if container has been
> > +	 * started without this option initially.
> > +	 *
> > +	 * To workaround this situation here is an ugly hack: first
> > +	 * mount of devpts inside container always runs without
> > +	 * @newinstance option providing back virtualized superblock.
> > +	 * The next mounts inside container go in a regular way.
> > +	 *
> > +	 * Note @devpts_once is always set for node. And be careful
> > +	 * about @else branch below.
> > +	 */
> > +	if (!get_exec_env()->devpts_once && get_exec_env()->_devpts_mnt)
> > +		s = sget(fs_type, compare_init_pts_sb, set_anon_super, flags, NULL);
> > +	else
> > +#endif
> 
> If we initialize ve->devpts_sb lazily (as we do now), we don't need this
> hunk as well as devpts_once flag, do we? This would look cleaner IMO.

 1) _devpts_mnt initialized (mounted) on container start time, as we do for
    a number of other subsystems, I think keeping it in that form better
    from readability view, no?

 2) first attempt to mount devpts inside container should be treated in a
    special way (note that restore procedure starts from inside of ve0, so
    we can't use ve_is_super here) -- ie first mount of devpts must always
    return premounted superblock we allocated when VE has been initialized.

That's a dirty hack but I don't see other way for workaround -- criu itself
targeted on vanilla kernel which doesn't provide devpts virtualization by
default.

Or you mean to mark _devpts_mnt = nil by default, drop init/fini routines
and use it solely instead of devpts_once + _devpts_mnt pair?