[Devel] Re: [RFC][v6][PATCH 9/9]: Document clone_with_pids() syscall

Randy Dunlap randy.dunlap at oracle.com
Thu Sep 10 08:26:59 PDT 2009


On Wed, 9 Sep 2009 23:14:13 -0700 Sukadev Bhattiprolu wrote:

> 
> Subject: [RFC][v6][PATCH 9/9]: Document clone_with_pids() syscall
> 
> This gives a brief overview of the clone_with_pids() system call.  We should
> eventually describe more details either in clone(2) or in a new man page.
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev at vnet.linux.ibm.com>
> ---
>  Documentation/clone-with-pids |   58 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 58 insertions(+)
> 
> Index: linux-2.6/Documentation/clone-with-pids
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6/Documentation/clone-with-pids	2009-09-09 21:53:30.000000000 -0700
> @@ -0,0 +1,58 @@
> +
> +struct pid_set {
> +	unsigned int num_pids;
> +	pid_t pids[];
> +};
> +
> +clone_with_pids(int flags, void *child_stack_base, int *parent_tid_ptr,
> +			int *child_tid_ptr, NULL, struct pid_set *pid_setp)
> +
> +	The clone_with_pids() system call is identical to clone(), except
> +	that it allows the user to specify a pid for the child process
> +	in each of the child processes' pid name spaces.
> +
	                                    namespaces.  {as below}

> +	This system call is meant to be used when restarting an application
> +	from an earlier checkpoint. When restarting the application, the
> +	processes in the application must get the same pids they had at the
> +	time of the checkpoint.
> +
> +	The 'pid_setp' parameter defines a set of pids to use, one for each
> +	pid-namespace of the child process.  The order pids in '->pids[]'

	                                         order of pids

> +	corresponds to the nesting order of pid-namespaces, with ->pids[0]
> +	corresponding to the init_pid_ns.
> +
> +	If a pid in the ->pids list is 0, the kernel will assign the next
> +	available pid in the pid namespace, for the process.
> +
> +	If a pid in the ->pids[] list is non-zero, the kernel tries to assign
> +	the specified pid in that namespace.  If that pid is already in use
> +	by another process, the system call fails with -EBUSY.
> +
> +	On success, the system call returns the pid of the child process in
> +	the parent's active pid namespace.
> +
> +	On failure, clone_with_pids() returns -1 and sets 'errno' to one of
> +	following values (the child process is not created).
> +
> +	EPERM	Caller does not have the SYS_ADMIN privilege needed to excute

		                                                       execute

> +		this call.
> +
> +	EINVAL	The number of pids specified in 'pid_set.num_pids' exceeds
> +		the current nesting level of parent process
> +
> +	EBUSY	A requested 'pid' is in use by another process in that name
> +		space.
> +
> +Example:
> +
> +	struct pid_set pid_set { 3, {0, 99, 177} };
> +	void *child_stack = malloc(STACKSIZE);
> +
> +	/* set up child_stack, like with clone() */
> +	rc = clone_with_pids(clone_flags, child_stack, NULL, NULL, &pid_set);
> +
> +	if (rc < 0) {
> +		perror("clone_with_pids()");
> +		exit(1);
> +	}

What happens when one of the pids is busy?  Say the last one in the
example above [177].  Are the first 2 children already cloned
or are all pids checked for availability before cloning?
If the latter, is there a race there?
and what value is returned?

---
~Randy
LPC 2009, Sept. 23-25, Portland, Oregon
http://linuxplumbersconf.org/2009/
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list