[Devel] Re: containers development plans (July 10 version)

Serge E. Hallyn serge at hallyn.com
Thu Jul 12 11:45:00 PDT 2007


Quoting Kirill Korotaev (dev at sw.ru):
> Serge E. Hallyn wrote:
> > (If you missed earlier parts of this thread, you can catch earlier parts of
> > this thread starting at
> > https://lists.linux-foundation.org/pipermail/containers/2007-July/005860.html)
> > 
> > Thanks for all the recent feedback.  I particularly added a lot from Paul
> > Menage and Cedric.
> > 
> > We are trying to create a roadmap for the next year of
> > 'container' development, to be reported to the upcoming kernel
> > summit.  Containers here is a bit of an ambiguous term, so we are
> > taking it to mean all of:
> > 
> > 	1. namespaces
> >                 kernel resource namespaces to support resource isolation
> >                 and virtualization for virtual servers and application
> >                 checkpoint/restart.
> > 	2. task containers framework
> >                 the task containers (or, as Paul Jackson suggests, resource
> >                 containers) framework by Paul Menage which especially
> >                 provides a framework for subsystems which perform resource
> >                 accounting and limits.
> > 	3. checkpoint/restart
> > 
> > A (still under construction) list of features we expect to be worked on
> > next year looks like this:
> > 
> >         1. completion of ongoing namespaces
> >                 pid namespace
> >                         merge two patchsets
>                              sukadev@ and Pavel already agreed and will resend it soon
> >                         clone_with_pid()
> >                         kthread cleanup
> >                                 especially nfs
> >                                 autofs
> >                         af_unix credentials (stores pid_t?)
> >                 net namespace
> >                 ro bind mounts
> 
> IMHO ro bind mounts are not related to namespaces anyhow, but ok if you guys want to mention it.

Hmm, yes it's more for the "userspace containers" - meaning the
userspace usage of namespaces.  But I'm not sure it's worth breaking
that out.

> >                 sysvipc
> >                         "set identifier" syscall
> 
> the last one is related to checkpointing, so plz move it from here...

It started under checkpointing, but I'll move it back :)

> >         2. continuation with new namespaces
> >                 devpts, console, and ttydrivers
> >                 user
> >                 time
> >                 namespace management tools
> >                 namespace entering  (using one of:)
> >                         bind_ns()
> >                         ns container subsystem
> >                         (vs refuse this functionality)
> >                 multiple /sys mounts
> >                         break /sys into smaller chunks?
> >                         shadow dirs vs namespaces
> >                 multiple proc mounts
> >                         likely need to extend on the work done for pid namespaces
> >                         i.e. other /proc files will need some care
> 
> different statistics virtualization here in /proc for top and other tools
> 
> >         3. any additional work needed for virtual servers?
> >                 i.e. in-kernel keyring usage for cross-usernamespace permissions, etc
> >                         nfs and rpc updates needed?
> >                         general security fixes
> 
> what is meant by "general security fixes"?

I think it means "we haven't thought it through enough"  :)

For instance, something needs to be done to be able to hand
partial capabilities to admins in a container/virtual server.  We've
talked about doing this using the in-kernel keyring, but we are far from
consensus or patches, and this will have to be solved.
 
> what I see additionaly:
>  - device access controls (e.g. root in container should not have access to /dev/sda by default)

Yes, that kind of falls under the above, but I'll add it separately.

>  - filesystems access controls

ditto.

> >         4. task containers functionality
> >                 base features
> >                         virtualized continerfs mounts
> >                                 to support vserver mgmnt of sub-containers
> >                         locking cleanup
> >                         control file API simplification
> >                         control file prefixing with subsystem name
> >                 specific containers
> >                         usespace RBCE to provide controls for
> >                                 users
> >                                 groups
> >                                 pgrp
> >                                 executable
> >                         split cpusets into
> >                                 cpuset
> >                                 memset
> >                         network
> >                                 connect/bind/accept controller using iptables
> >                         network flow id control
> >                         userspace per-container OOM handler
> 
> I don't see much about resource management here at all.
> We need resource controls for a lot of stuff like
> - RSS
> - kernel memory and different parameters like number of tasks
> - disk quota
> - disk I/O
> - CPU fairness
> - CPU limiting
> - container aware OOM
> 
> imho it is all related and should be discussed.
> 
> >         5. checkpoint/restart
> >                 memory c/r
> >                         (there are a few designs and prototypes)
> >                         (though this may be ironed out by then)
> >                         per-container swapfile?
> >                 overall checkpoint strategy  (one of:)
> >                         in-kernel
> >                         userspace-driven
> >                         hybrid
> >                 overall restart strategy
> >                 use freezer API
> >                 use suspend-to-disk?
> > 
> > In the list of stakeholders, I try to guess based on past comments and
> > contributions what *general* area they are most likely to contribute in.
> > I may try to narrow those down later, but am just trying to get something
> > out the door right now before my next computer breaks.
> > 
> > Stakeholders:
> >         Eric Biederman
> >                 everything
> >         google
> >                 containers
> >         ibm
> >                 everything
> >         kerlabs
> >                 checkpoint/restart
> >         openvz
> >                 everything
> >         osdl (Masahiko Takahashi?)
> >                 checkpoint/restart
> >         Linux-VServer
> >                 namespaces+containers
> >         zap project
> >                 checkpoint/restart
> >         planetlab
> >                 everything
> >         hp
> >                 ?
> >         XtreemOS
> >                 checkpoint/restart
> > 
> > Is anyone else still missing from the list?
> > 
> > thanks,
> > -serge
> > 

thanks Kirill,

-serge




More information about the Devel mailing list