[Devel] Re: [RFC][v3][PATCH 7/7] Define clone_with_pids syscall

Oren Laadan orenl at cs.columbia.edu
Mon Jun 1 09:58:15 PDT 2009



Oren Laadan wrote:
> 
> Serge E. Hallyn wrote:
>> Quoting Sukadev Bhattiprolu (sukadev at linux.vnet.ibm.com):
>>
>> Bear with me as I try to get this straight...
>>
>> Let's say we have the following task being checkpointed (btw, the
>> actual checkpoint code hopefully will, as Oren mentioned, not
>> be always checkpointing all of a task's pids, but only the
>> pids from the checkpointer's pidns level an up - but that doesn't
>> matter to you):
>>
>> Either:
>>
>> level no    |    pid
>> 0               1001
>> 1                50
>> 2                 3
>>
>> or
>>
>> level no    |    pid
>> 0               1001
>> 1                50
>> 2                 1
>>
>> Now we want to restart that inside a container.  So for the
>> first task we have a parent task:
>>
>> level no       |     pid
>> 0                    5009
>> 1                    1000
>> 2                     49
>> 3                      2
>>
>> calling clone_with_pid with upids {1001,50,3} to produce task:
>>
>> level no       |     pid
>> 0                   5010
>> 1                   1001
>> 2                    50
>> 3                     3
>>
>> So the numbers in your code for this case become:
>>
>>> +	unum_pids = pid_set.num_pids;
>>         unum_pids = 3
>>> +	knum_pids = task_pid(current)->level + 1;
>>         knum_pids = 3 + 1 = 4
>>
>>> +	target_pids = kzalloc((knum_pids + 1) * sizeof(pid_t), GFP_KERNEL);
>> XXX First problem:
>>         target_pids gets room for 5 pids, one more than we need.
> 
> The two issues are related, and this is intentional. The idea
> was that when you use CLONE_NEWPID, you imply a new nesting level
> and a pid==1 there. So you are not supposed to specify the pid
> of that new level.
> 
> IOW, the parent specify the pid's of the child from the _parent's_
> level and up (of the desired depth). CLONE_NEWPID creates a new
> pidns level below the parent's, and that is not covered in the
> array of pids.
> 
> By allocation an extra slot and forcing it to be 0, we ensure that
> the case of CLONE_NEWPID is covered correctly. Clearly, if this
> flag isn't set, then the extra slot is redundant (but doesn't hurt).
> 
>>> +	j = knum_pids - unum_pids;
>>         j = 1, so we copy the 3 pids in the right place.
>>
>>> +	rc = copy_from_user(&target_pids[j], pid_set.target_pids, size);
>>> +	if (rc) {
>>> +		rc = -EFAULT;
>>> +		goto out_free;
>>> +	}
>>> +
>>> +	return target_pids;
>>
>> For the second one, we have a parent task
>>
>> level no       |     pid
>> 0                   5009
>> 1                   1000
>> 2                    49
>>
>> calling clone_with_pid with CLONE_NEWPID and {1001,50,1} to produce:
>>
>> level no       |     pid
>> 0                   5010
>> 1                   1001
>> 2                    50
>> 3                    1
>>
>> So the numbers in your code become:
>>
>>> +	unum_pids = pid_set.num_pids;
>>         unum_pids = 3
> 
> This is a "bug" of the parent. The parent should specify the pids
> from the parent's level only and up, and not include the new level
> below that will be created. (After all, it will have to be 1).

Suka: maybe state this explicitly,too, in the patch description
and in the comments in the code ?   Like:

"The syscall operates relative to the parent's nesting level. If
CLONE_NEWPID is used as well, it will create an additional level
below. In this case the syscall indicates the desired pids at the
nesting levels seen by the parent."

Oren.

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list