[Devel] Re: [RFC][PATCH] ns: Syscalls for better namespace sharing control.

Eric W. Biederman ebiederm at xmission.com
Wed Mar 3 12:16:45 PST 2010


Pavel Emelyanov <xemul at parallels.com> writes:

>> I just played with this and if you make the semantics of unshare(CLONE_NEWPID)
>> to be that you become the idle task aka pid 0, and not the init task pid 1 the
>> implementation is trivial.
>
> This is not ... handy - if after enter you have pid 0 you obviously
> can't perform 2 parallel enters.

2 parallel enters?  I meant you have pid 0 in the entered pid namespace.
You have pid 0 because your pid simply does not map.

There is nothing that makes to parallel enters impossible in that.
Even today we have one thread per cpu that has task->pid == &init_struct_pid
which is pid 0.

For the case of unshare where we are designed to be used with PAM I don't
think my proposed semantics work.  For a join needed an extra fork before
you are really in the pid namespace should be minor.

> The way I see it:
>
> As far as the numbers reported to the userspace are concerned:
> 1. task, that enters is still visible by its old parent by old pid
> 2. task, that enters gets some pid within the entering namespace
>    and reports its parent pid to have pid 1 (init obviously doesn't
>    care)
> 3. we _can_ try to allocate new pid equal to the old one so that
>    glibc stays happy
>
>
> As far as the pointers are concerned:
> 1. parent pointer doesn't change
> 2. task_pid(tsk) one (i.e. struct pid * one) _can_ change if
>    a) we don't allow threads enter (de_thread problem is handeled)
>    b) we don't allow leave the group/session, i.e. check, that there
>       is the only one task that enters lives in its pgid/sid
>    c) we wait for the quiescent state to pass by before destroying
>       the old pid to handle race with sys_kill()

That doesn't handle the case of cached struct pids.  A good example is
waitpid, where it waits for a specific struct pid.  Which means that
allocating a new struct pid and changing task->pid will cause
waitpid(pid) to wait forever...

To change struct pid would require the refcount on struct pid to show
no references from anywhere except the task_struct.

At the cost of a little memory we can solve that problem for unshare
if we have a an extra upid in struct pid, how we verify there is space
in struct pid I'm not certain.

I do think that at least until someone calls exec the namespace pids are
reported to the process itself should not change.  That is kill and
waitpid etc.  Which suggests an implementation the opposite of what
I proposed.  With ns_of_pid(task_pid(current)) being used as the
pid namespace of children, and current->nsproxy->pid_ns not changing
in the case of unshare.

Shrug.

Or perhaps this is a case where we use we can implement join with
an extra process but we can't implement unshare, because the effect
cannot be immediate.

Eric
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list