[Devel] Re: [RFC][v7][PATCH 0/9] Implement clone2() system call

Oren Laadan orenl at librato.com
Thu Oct 1 08:19:27 PDT 2009



Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl at librato.com] wrote:
> | 
> | 
> | Sukadev Bhattiprolu wrote:
> | > === NEW CLONE() SYSTEM CALL:
> | > 
> | > To support application checkpoint/restart, a task must have the same pid it
> | > had when it was checkpointed.  When containers are nested, the tasks within
> | > the containers exist in multiple pid namespaces and hence have multiple pids
> | > to specify during restart.
> | > 
> | > This patchset implements a new system call, clone2() that lets a process
> | > specify the pids of the child process.
> | > 
> | > Patches 1 through 6 are helper patches, needed for choosing a pid for the
> | > child process.
> | > 
> | > Patch 8 defines a prototype of the new system call. Patch 9 adds some
> | > documentation on the new system call, some/all of which will eventually
> | > go into a man page.
> | > 
> | 
> | [...]
> | 
> | > 
> | > Based on these requirements and constraints, we explored a couple of system
> | > call interfaces (in earlier versions of this patchset) and currently define
> | > the system call as:
> | > 
> | > 	struct clone_struct {
> | > 		u64 flags;
> | > 		u64 child_stack;
> | > 		u32 nr_pids;
> | > 		u32 parent_tid;
> | > 		u32 child_tid;
> | 
> | So @parent_tid and @child_tid are pointers to userspace memory and
> | require 'u64' (and it won't hurt to make @reserved1 a 'u64' as well).
> 
> Well, if we make parent_tid and child_tid u64, we could move reserved1
> after ->nr_pids and leave it as a 32-bit value.

Sure. In any case, won't hurt to leave large reserved space -
someone may be thankful for it in the future ;)

> 
> | 
> | > 		u32 reserved1;
> | > 		u64 reserved2;
> | > 	};
> | > 
> | 
> | Also, for forward/backward compatibility, explicitly state in the
> | documentation, and enforce in the kernel, that flags which are not
> | defined must not be set, and that reserved{1,2} must remain 0.
> 
> Agree with checking for reserved1 and reserved2.
> 
> We currently don't check for invalid clone_flags - we just ignore them.
> Adding checks like
> 
> 	if (fls(kcs.flags) > fls(CLONE_LAST_FLAG))
> 
> would assume we always use bits in order (while it seems to make sense, to
> use them in order, we don't seem to have done so in the past).
> 
> Alternatively we could define a CLONE_FLAG_MASK of valid flags and update
> the mask when each new clone flag is added. 
> 
> But do we really need to check for invalid flags ?

I'd go for a a mask.

The idea is that we want to educate userspace to _not_ use unused
flags now. For if userspace sets an unused flag now and we let it
be, the application will break when we give meaning to that flag.

Oren.

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list