[Devel] Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
Eric W. Biederman
ebiederm at xmission.com
Wed Jun 17 20:20:21 PDT 2009
Sukadev Bhattiprolu <sukadev at linux.vnet.ibm.com> writes:
> Prevent container-inits from using CLONE_PARENT
>
> If a container-init creates a sibling (using CLONE_PARENT), pid namespace
> semantics become complicated:
>
> - the "active pid namespace" of the sibling will be the descendant
> container, but its not obvious if that is correct.
It is correct the sibling must not change pid namespaces. You are not
allowed to escape out of a pid namespace.
> - if container-init exits, it will terminate the sibling, but again
> its not clear if that is the correct behavior.
Again correct because the container-init is the child reaper for the pid namespace.
No reaper no namespace.
> - the sibling exists in both parent and child containers while current
> pid namespace semantics assume that only container-init can exist
> in both parent/child containers.
All tasks in the container also exist in the parent container.
What assumption are you talking about?
> - the parent of the sibling is not a descendant of container-init
> (while pid namespaces assume that all processes in the container
> are descendants of the container-init)
User space assumes that certainly. What part of the pid namespace
code makes such an assumption?
> - When the sibling dies, the SIGCHLD is sent to its parent (if
> alive), i.e the signal escapes the container to a parent container.
> (if the parent of the sibling exits, the container-init then becomes
> the reaper of the sibling).
Yes.
> To keep pid namespace semantics simple, prevent container-inits from using
> CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
> and pid-namespace interactions.
The only argument that I can see that carries any weight is that unix
semantics fundamentally assume a process tree. Allowing init to use
CLONE_PARENT creates a multi-rooted process tree.
At which point the is_global_init check is foolish.
Eric
> Untested, RFC patch :-)
>
> Signed-off-by: Sukadev Bhattiprolu <sukadev at us.ibm.com>
> ---
> kernel/fork.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> Index: linux-mmotm/kernel/fork.c
> ===================================================================
> --- linux-mmotm.orig/kernel/fork.c 2009-06-17 18:23:23.000000000 -0700
> +++ linux-mmotm/kernel/fork.c 2009-06-17 19:17:54.000000000 -0700
> @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
> if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
> return ERR_PTR(-EINVAL);
>
> + /*
> + * To keep pid namespace semantics simple, prevent container-inits
> + * from creating siblings.
> + */
> + if ((clone_flags & CLONE_PARENT) &&
> + is_container_init(current) && !is_global_init(current))
> + return ERR_PTR(-EINVAL);
> +
> retval = security_task_create(clone_flags);
> if (retval)
> goto fork_out;
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list