[Devel] Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
Sukadev Bhattiprolu
sukadev at linux.vnet.ibm.com
Thu Jun 18 15:40:06 PDT 2009
Eric W. Biederman [ebiederm at xmission.com] wrote:
| Sukadev Bhattiprolu <sukadev at linux.vnet.ibm.com> writes:
|
| > Prevent container-inits from using CLONE_PARENT
| >
| > If a container-init creates a sibling (using CLONE_PARENT), pid namespace
| > semantics become complicated:
| >
| > - the "active pid namespace" of the sibling will be the descendant
| > container, but its not obvious if that is correct.
|
| It is correct the sibling must not change pid namespaces. You are not
| allowed to escape out of a pid namespace.
|
| > - if container-init exits, it will terminate the sibling, but again
| > its not clear if that is the correct behavior.
|
| Again correct because the container-init is the child reaper for the pid namespace.
| No reaper no namespace.
|
| > - the sibling exists in both parent and child containers while current
| > pid namespace semantics assume that only container-init can exist
| > in both parent/child containers.
|
| All tasks in the container also exist in the parent container.
| What assumption are you talking about?
You are right, thats not really different for CLONE_PARENT.
|
| > - the parent of the sibling is not a descendant of container-init
| > (while pid namespaces assume that all processes in the container
| > are descendants of the container-init)
|
| User space assumes that certainly. What part of the pid namespace
| code makes such an assumption?
I was referring only to user-space view.
|
| > - When the sibling dies, the SIGCHLD is sent to its parent (if
| > alive), i.e the signal escapes the container to a parent container.
| > (if the parent of the sibling exits, the container-init then becomes
| > the reaper of the sibling).
|
| Yes.
|
| > To keep pid namespace semantics simple, prevent container-inits from using
| > CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
| > and pid-namespace interactions.
|
| The only argument that I can see that carries any weight is that unix
| semantics fundamentally assume a process tree. Allowing init to use
| CLONE_PARENT creates a multi-rooted process tree.
Right.
|
| At which point the is_global_init check is foolish.
Well, I was trying to disable CLONE_PARENT just with pid namespaces,
Disabling CLONE_PARENT for global init seemed independent of namespaces
and there was recent talk of potential users of CLONE_PARENT so I am
not sure if there is an init that uses the old threading model !
I don't have convincing reason besides "lets enable when uses/semanitcs
for CLONE_PARENT with pid namespaces are clear".
|
| Eric
|
|
| > Untested, RFC patch :-)
| >
| > Signed-off-by: Sukadev Bhattiprolu <sukadev at us.ibm.com>
| > ---
| > kernel/fork.c | 8 ++++++++
| > 1 file changed, 8 insertions(+)
| >
| > Index: linux-mmotm/kernel/fork.c
| > ===================================================================
| > --- linux-mmotm.orig/kernel/fork.c 2009-06-17 18:23:23.000000000 -0700
| > +++ linux-mmotm/kernel/fork.c 2009-06-17 19:17:54.000000000 -0700
| > @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
| > if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
| > return ERR_PTR(-EINVAL);
| >
| > + /*
| > + * To keep pid namespace semantics simple, prevent container-inits
| > + * from creating siblings.
| > + */
| > + if ((clone_flags & CLONE_PARENT) &&
| > + is_container_init(current) && !is_global_init(current))
| > + return ERR_PTR(-EINVAL);
| > +
| > retval = security_task_create(clone_flags);
| > if (retval)
| > goto fork_out;
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list