[Devel] Re: [v11][PATCH 9/9] Document clone_with_pids() syscall

Oren Laadan orenl at librato.com
Sat Nov 7 13:56:13 PST 2009



Sukadev Bhattiprolu wrote:
> Matt Helsley [matthltc at us.ibm.com] wrote:
> | > If userspace passes an array with n pids and there are k namespace levels
> | > then clone_with_pids() makes sure that the kernel sees a pid array like:
> | > 
> | > index	  0     ... k - (n + 1)        ...          k - 1
> | > 	+-----------------------+-------------------------+
> | > pid_t	| 0 ..................0 | <copied from userspace> |
> | > 	+-----------------------+-------------------------+
> | 
> | (diagram assumes n != k. If n == k then pids[0] is the pid desired
> | in the initial namespace..)
> 
> True.
> 
> Also I was not sure if we should prevent choosing pids in ancestor containers.
> since a process is not even supposed to know of ancestor namespaces. Is there
> a need for choosing pids in those namespaces.

IMHO this is a bit confusing.

A process observes a single namespace - the one in which it "lives".
There is no such thing as descendant namespaces for that process.
There may be ancestor namespaces.

The clone occurs in the context of the process. So the process that
is forking _must_ indicate pids in _ancestor_ namespaces if it wishes
to select pids in those (as is the case in c/r).

> 
> | 
> | > 
> | > So even though the order is different from choosepid() the calling
> | > task still doesn't need to know its pidns level. Of course, just
> | > like choosepid(), n <= k or userspace will get EINVAL.
> | 
> | Forgot to mention that I prefer the way choosepid orders the pids.
> | It's not inspired by the way that the kernel implements pid namespaces
> | and has more to do with the way userspace sees things (IMHO).
> 
> Hmm, In general we C/R a descendant container. So the way userspace
> sees it at that point is "what are the pids of this process in my current
> and in any descendant namespaces". IOW, the pid of container from which
> we checkpoint seems more interesting first - right ?  If so, the pids[]
> are better ordered from older namespace to younger namespace ?

When we checkpoint, we use an external process to record the state of
(current or) descendant namespaces.

When we restart, we run in the context of the restarting process, so
we select a pid in the current and _ancestor_ namespaces.

So the order of pids as it (will) appear in the checkpoint image for
a given process will be from an ancestor down to descendant namespaces.
And this is how we (will) hand it over to eclone().

> 
> | I don't know if it makes more sense to change clone_with_pids() or have
> | [e]glibc wrappers swap the array contents.

I prefer to decide now on an order and stick to it in the kernel and
in glibc.

Oren

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list