[Devel] Re: [RFC][v4][PATCH 7/7]: Define clone_extended() syscall
Oren Laadan
orenl at librato.com
Thu Aug 6 08:37:17 PDT 2009
Serge E. Hallyn wrote:
> Quoting Sukadev Bhattiprolu (sukadev at linux.vnet.ibm.com):
>> Subject: [RFC][v4][PATCH 7/7]: Define clone_extended() syscall
>>
>> Container restart requires that a task have the same pid it had when it was
>> checkpointed. When containers are nested the tasks within the containers
>> exist in multiple pid namespaces and hence have multiple pids to specify
>> during restart.
>>
>> This patch defines, a new system call, clone_extended() which is like clone(),
>> but takes a new 'pid_set' parameter. This parameter lets caller choose
>> specific pid numbers for the child process, in the process's active and
>> ancestor pid namespaces. (Descendant pid namespaces in general don't matter
>> since processes don't have pids in them anyway, but see comments in
>> copy_target_pids() regarding CLONE_NEWPID).
>>
>> Unlike clone(), however, clone_extended() needs CAP_SYS_ADMIN, at least for
>> now, to prevent unprivileged processes from misusing this interface.
>
> It only needs that when specifying pids.
>
>> While the main motivation for this interface is the need to let a process
>> choose its 'pid numbers', the clone_extended() interface uses 64-bit clone
>> flags. The 'higher' portion of the clone flags are unused and are only
>> included to preclude yet another version of clone when a new clone flag is
>> needed.
>>
>> ===== Interface:
>>
>> Compared to clone(), clone_extended() needs to pass in three more pieces
>> of information:
>>
>> - additional 32-bit of clone_flags
>> - number of pids in the set
>> - user buffer containing the list of pids.
>>
>> But since clone() already takes 5 parameters and some (all ?) architectures
>> are restricted to 6 parameters to a system-call, additional data-structures
>> (and copy_from_user()) are needed.
>>
>> The proposed interface for clone_extended() is:
>>
>> struct clone_tid_info {
>> void *parent_tid; /* parent_tid_ptr parameter */
>> void *child_tid; /* child_tid_ptr parameter */
>> };
>>
>> struct pid_set {
>> int num_pids;
>> pid_t *pids;
>> };
>>
>> int clone_extended(int flags_low, int flags_high, void *child_stack,
>> void *unused, struct clone_tid_info *tid_ptrs,
>> struct pid_set *pid_setp);
>
> I was thinking additional flags would be passed in the (renamed)
> struct pid_set.
Yes.
But maybe in (renamed) 'struct clone_info' instead of 'struct pid_set' ?
I vaguely recall a strong preference to not require copy-from-user
during a fast-path clone, because it may hurt performance.
*If* this is the case, then maybe place extra flags among the
"base" args, or at least a CLONE_EXTRA would indicate that more
arguments need to be pulled from user-space ?
Do you intend to get feedback from LKML too ?
Oren.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list