[Devel] [RFC rh7 v2] ve/tty: vt -- Implement per VE support for console and terminals

Cyrill Gorcunov gorcunov at virtuozzo.com
Thu Aug 6 03:26:47 PDT 2015


On Thu, Aug 06, 2015 at 12:55:41PM +0300, Vladimir Davydov wrote:
> On Thu, Aug 06, 2015 at 12:39:14PM +0300, Cyrill Gorcunov wrote:
> > On Thu, Aug 06, 2015 at 12:30:46PM +0300, Vladimir Davydov wrote:
> > > > 
> > > > As to me ioctls on its own are not bad at all, but having some
> > > > native interface (such as we do with ve.X entries in cgroups)
> > > > make it open for scripting at least: any new feature is a way
> > > > easier to test directly from bash without writting some
> > > > implementation in vzctl. To be fair the first argument I've
> > > > heard of was like "we're moving to cgroups interface. period." ;)
> > >
> > > I talked to Konstantin and Pavel and here goes the current policy
> > > regarding the user API:
> > > 
> > >  - We rework only those pieces of API that do not currently work and
> > >    cannot be fixed by design (like ve create/enter).
> > > 
> > >  - Since now each container has unique integer ve.veid, which makes it
> > >    possible to call old ioctls on uuid-named containers, we use ioctls
> > >    for such containers too wherever possible.
> > > 
> > >  - The idea to move from binary API (ioctl) to text-based (cgroup) does
> > >    sound appealing and may be we will eventually switch to it, but to
> > >    make the new API look good, we need time to design it well, so that
> > >    we wouldn't have to rework it once again in Vz8. So we'd better
> > >    postpone this work until we have enough time.
> > > 
> > > BTW, I don't like the ve cgroup at all, to tell you the truth. That's
> > > why I personally vote for sticking to ioctls. Its lifetime, start/stop
> > > rules make me think, we should have implemented it as a namespace, not
> > > as a cgroup. I wonder if it would be acceptable from vzctl and criu pov.
> > 
> > Thanks for summarizing all the bullets! You know I forgot to mention
> > that cgroups actually saves us a little in criu, because for example,
> > imagine we need to configure some resource via our non-general ioctl
> > code -- this require either plugin in criu, either some custom scripting,
> > which is not convenient.
> 
> But you do need some scripting anyway, because the ve cgroup is not
> mainstream, no?

Only a minimal. criu scans all cgroups a task belongs to and remember
them (well, it's a bit more complecated but you get the idea). That
said if some task belongs into some non-vanilla cgroup -- criu doesn't
care, it will try to handle it.

There some side note I've to add -- currently criu only restores cgroup
properties it knows of, thus if say we meet some cgroup ve/200/bla/
which has some custom property "xxx", the criu will ignore it. I've
made a patch which allow to specify non-standart properties to be
c/r'ed, but patch is not yet merged. When it get merged we can
specify all our "custom virtuozzo specific" properties in libvzctl
script which calls criu for c/r and handle them without much problems.

> Anyway, if it is difficult for criu to work with ioctls, we should
> probably revise our policy. May be we could move all user API that needs
> to be called on start/restore to ve cgroup, while leaving as ioctls only
> those that are only supposed to be called when the container is running
> (e.g. getting stats). What do you think?

At the moment we don't use any non-vanilla ioctl and I would really
prefer to not have to use them (one of the problem -- Pavel is currently
hosting general criu code on github and if for some reason we would
start using custom ioctls which would have to be wrapped into a
plugin in worst case, this would bring code divergence -- instead of
using simply vanilla criu we would have to ship plugins in addition.
And testing all this zoo of tools definitely won't add any fun.)

Maybe let gather all ioctls we won't be able to escape using into
one place and check what we can do?

> > Could you please share more details on the one before last sentence, I
> > mean about namespaces.
> 
> VE cgroup acts like a namespace, in fact. You can't just attach to it -
> you need to "start" it first. Plus, it has the "stop" point. We
> effectively abuse the cgroup interface for making it work. That's what
> makes me think it should have been implemented as a namespace. I mean,
> on container start
> 
>  - call clone with CLONE_VE
>  - write veid to /proc/pid/veid
>  - confiugre using ioctls
> 
> and use setns for attaching to it on container enter.
> 
> There is a problem though - we don't have enough bits in clone_flags, so
> I don't think it's plausible and we still have to get along with the ve
> cgroup.

I see, thanks for info!

	Cyrill



More information about the Devel mailing list