[Devel] Re: [PATCH 00/10] Containers(V10): Generic Process Containers
Serge E. Hallyn
serue at us.ibm.com
Wed Jun 6 17:05:59 PDT 2007
Quoting Paul Jackson (pj at sgi.com):
> > > I wasn't paying close enough attention to understand why you couldn't
> > > do it in two steps - make the container, and then populate it with
> > > resources.
> >
> > Sorry, please clarify - are you saying that now you do understand, or
> > that I should explain?
>
> Could you explain -- I still don't understand why you need this option.
> I still don't understand why you can't do it in two steps - make the
> container, then add cpu/mem separately.
Sure - the key is that the ns subsystem uses container_clone() to
automatically create a new container (on sys_unshare() or clone(2)
with certain flags) and move the current task into it. Let's say
we have done
mount -t container -o ns,cpuset nsproxy /containers
and we, as task 875, happen to be in the topmost container:
/containers/
Now we fork task 999 which does an unshare(CLONE_NEWNS), or we just
clone(CLONE_NEWNS). This will create
/containers/node_999
and move task 999 into that container. Except that when it tries
attach_task() it is refused by cpuset. So the container_clone() fails,
and in turn the sys_unshare() or clone() fails. A login making use
of the pam_namespace.so library would fail this way with the
ns and cpuset subsystems composed.
We could special case this by having
kernel/container.c:container_clone() check whether one of the subsystems
is cpusets and, if so, setting the defaults for mems and cpus, but
that is kind of ugly. I suppose as a cleaner alternative we could
add a container_subsys->inherit_defaults() handler, to be called at
container_clone(), and for cpusets this would set cpus and mems to
the parent values - sibling exclusive values. If that comes to nothing,
then the attach_task() is still refused, and the unshare() or clone()
fails, but this time with good reason.
thanks,
-serge
More information about the Devel
mailing list