[Devel] [PATCH rh7 v2] ve/devpts: Support per-VE mount namespace

Wed Jul 22 03:25:21 PDT 2015

On Wed, Jul 22, 2015 at 12:59:22PM +0300, Vladimir Davydov wrote:
> 
> Please don't do that :-)

OK

> > > If we initialize ve->devpts_sb lazily (as we do now), we don't need this
> > > hunk as well as devpts_once flag, do we? This would look cleaner IMO.
> > 
> >  1) _devpts_mnt initialized (mounted) on container start time, as we do for
> >     a number of other subsystems, I think keeping it in that form better from
> >     readability view, no?
> 
> I'd like to keep ve_start_container as simple as possible.

OK

> > 
> >  2) first attempt to mount devpts inside container should be treated in a
> >     special way (note that restore procedure starts from inside of ve0, so
> >     we can't use ve_is_super here) -- ie first mount of devpts must always
> >     return premounted superblock we allocated when VE has been initialized.
> 
> I don't see how your patch helps that.

In exactly way it should -- we've a mark for first mount call to devpts
@devpts_once (for ve0 it's always 1, so ve0 doesn't do anything special).
In turn for containers we premount isolated devpts and then when first
mount called inside container (or by criu on restore) -- we always provide
ours premounted _devpts_mnt regardless the @newinstance setting. But this
applies on first mount call only, subsequent calls pass as usual.

Vladimir, I suspect I didn't explain well what I'm doing, and what I wanted
to achieve :/ Letme try again.

There are two scenarios how init (systemd based or anything else) works inside
container

1) Simply mount devpts without newinstance option. That's how old containers
   or ubuntu-14 container works: they simply mount devpts and don't consider
   situation when they are working under lightweight virtualization environment.
   For this we always provide per-container devpts instance making changes inside
   kernel itself, so that containers don't see the node's devpts neither they
   can reach other containers' devpts. Because of being a separate superblock
   the CRIU notes that and adds @newinstance option into mount options which
   will be used on restore. Note this moment: container's init doesn pass
   @newinstance, we add it by hands inside CRIU. And this makes different
   codeflow inside kernel -- init on container's start doesn't pass @newinstance
   while on restore we pass it in contrary. Thus we have to workaround this
   inside kernel, and for this sake I treat first mount as a special one
   and return container's virtualized mount point.

2) Mounts devpts with @newinstance from the very beginning, as systemd
   init does in centos-7 or fedora container. For this we don't need these
   hacks but we're to support old containers.

> > That's a dirty hack but I don't see other way for workaround -- criu itself
> > targeted on vanilla kernel which doesn't provide devpts virtualization by
> > default.
> > 
> > Or you mean to mark _devpts_mnt = nil by default, drop init/fini routines
> > and use it solely instead of devpts_once + _devpts_mnt pair?
> 
> Yeah, that's what I mean, but you'll have to keep a reference to the
> super block rather than vfsmount on ve_struct for that.

This won't help though with one shot first mount I fear.

Still I'll have to do first-shot hack :/