[Devel] Re: Thoughts on virtualizing task containers

Serge E. Hallyn serue at us.ibm.com
Mon Oct 1 10:24:47 PDT 2007


Quoting Paul Menage (menage at google.com):
> On 9/11/07, Serge E. Hallyn <serue at us.ibm.com> wrote:
> > Hi Paul,
> >
> > did any good ideas come up at the mini-summit or k-s, or were any
> > decisions made?
> >
> 
> Discussions were had, but decisions weren't really made.
> 
> My vague thoughts on how to do virtualization are below.

Thanks.  Actually doesn't seem that vague, and makes sense.
I'd advocate following this path.  Have you started any
patches for this?

> 1) Add support for a subsystem state object to be shared with its
> children. specifically:
> - for each subsystem, have a <subsystem.inherit> control file, which
> defaults to 0. This can only be changed when the cgroup has no
> children
> - any children of a cgroup will share subsystem state with the parent
> for any subsystems whose <inherit> file is 1
> 
> This ties in with a request that Balbir made for being able to share
> resource limits between different levels of cgroups, but it's also
> useful for virtualization. It's something I wanted to describe in my
> OLS talk but didn't really have time for.
> 
> 2) have a virtualization cgroup subsystem, which like other subsystems
> can be included in at most one hierarchy. The virtualization subsystem
> might perhaps be the same thing as the nsproxy subsystem?
> 
> 3) when mounting a cgroup filesystem, if the virtualization subsystem
> is mounted, and the caller is not in its root cgroup (i.e. it's a
> guest), then:
> 
> - the guest can only see subsystems in the same hierarchy, which

same hierarchy as the virtualization cgroup subsystem, right?

> additionally have <inherit> set to 0
> 
> - the vfsmount returned from cgroup_get_sb() doesn't refer to the root
> of the hierarchy, but instead to the cgroup directory that the guest
> is in
> 
> - the guest can only mount a single hierarchy, (which therefore must
> be a subset of the hierarchy that the guest is running in)
> 
> - at the time of mount, the <inherit> bits for any subsystems *not*
> selected by the guest get set to 1, thus any guest processes share the
> same subsystem state for those subsystems (this is analagous to having
> the subsystem not be mounted, at the root/host level).

So then for subsystems which the guest did select, are there
default values inherited from the parent cgroup, and restrictions based
on those?  Or does the guest get to set any values it wants for those
subsystems?

> This approach is a little more restrictive than I'd like, but I think
> it should support the basic nested virtual server model reasonable
> well.
> 
> These changes are going to require a little bit of plumbing in the
> core cgroup code, but should have very little effect on any subsystems
> themselves, except for a few ways:
> 
> - each subsystem will now have a private parent/child tree running
> through its subsystem states, rather than having to use the main
> cgroup tree
> 
> - there will no longer be a direct mapping from a subsystem state to a
> cgroup. I'm not sure that this will cause anyone a problem. we'll have
> to tweak the current cgroup iteration interfaces to instead iterate
> across all the processes in a subsystem state, which may include
> multiple cgroups
> 
> Paul
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list